The Library of Congress, its digital strategy, 
and crowdsourcing 


LIBRARY crown Sime 


About Campaigns Latest Discuss Help 


=> 0 io 
GET INVOLVED: 


Volunteer to uncover our shared history and make 


£ y% of 


documents more searchable for everyone 


Anyone Can Contribute 


TRANSCRIBE REVIEW TAG 
ype what you see on the page Register to edit and Register to add tags and share what you 
fier 


Screenshot of the homepage of the Library of Congress’s Crowd program 


In late October, I asked the Preservation Directorate of the Library of Congress (LOC), 
about what they decide to digitize and if they have a process similar to NARA (National 
Archives and Records Administration, called National Archives in the rest of this article), with 


their own digitization priorities including working with external partners. After thanking me for 


my interest in the LOC’s preservation work, Jon Sweitzer-Lamme of the Preservation 


Directorate responded by saying: 


The Library’s digital strategy is available here: https://www.loc.gov/digital-strategy. Our 
prioritization is driven by demand, such as demand for our presidential papers collections 
like the newly released Theodore Roosevelt Papers (https://www.loc.gov/item/prn-18- 
132/), and preservation needs, especially if an item can’t be served to researchers 
anymore due to its condition. We have excellent in-house digitization capabilities and 
also utilize external contractors and partners to digitize our content. 
That does answer my question, but unfortunately the answer from LOC did not come 
soon enough for a class assignment I had where I asked reference questions in the same vein of 
different institutions (AskUsNow!, Maryland State Archives, and UMD Archives). I later posted 


it on the Internet Archive. 
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This also shows the site is made possible with a partnership via Amazon’s SES [Simple Email 
Service], a worrying infiltration of public institutions with those from the corporate world. Even 


so, the Crowd program runs on open source software, so that is a positive. 


Most exciting of all is the new “crowd” program of LOC which resembles the citizen 


archivist initiative of the National Archives. I’ve participated in the latter, a bit, in the past. 


While there are few campaigns to transcribe, review, or tag information at the present, as the 
program is in the beta stage, it will likely be expanded in the future. This is linked open data at 
its finest, connecting people with content and bringing them further into the process to make 
record usage more collaborative, going beyond past efforts. 

With that, this new program fulfills the digital strategy of LOC (without a doubt different 
than the one in 2000), which states that their mission is to “engage, inspire, and inform the 
Congress and the American people with a universal and enduring source of knowledge and 
creativity,” with initiatives such as this one trying to ensure that “all Americans are connected to 
the Library of Congress.” This is also connected to their strategic plan which has four major 
goals: expanding access, enhancing services, optimizing resources, and measuring results. As for 
the digital strategy it also notes the role of digital technology in fulfilling the mission of this 
institution, while also “throwing open the treasure chest, connecting, and investing in our 
future.” This strategy is also forward-thinking, stating that: 

The Library’s content, programs, and expertise are national treasures...We will make that 

content available and accessible to more people, work carefully to respect the 

expectations of the Congress and the rights of creators, and support the use of our content 
in software-enabled research, art, exploration, and learning The Library will continue to 
build a universal and enduring source of knowledge and creativity...We will expedite the 
availability of newly acquired or created content to the web and on-site access 
systems...We will explore creative solutions to reduce the barriers to material while 
respecting the rights of creators, the desires of our donors, and our other legal and ethical 
responsibilities...We will continue to enable computational use of our content and 
metadata...The Library offers an incredible wealth of content, programs, and services to 

Congress and the American people. We strive to connect with more users by making 

those services and content accessible for all... Many of the Library’s digital users come 

directly to our websites to discover content. To expose even more people to the Library’s 
content and services, we will bring digital content to users by making more of our 
material available in other websites and apps that they are already using...We will 
continue to participate in professional organizations and cooperatives that expand our 
perspectives and enable us to share our experiences. Additionally, developing partners in 


industry can allow us to connect the Library with new areas of expertise and 


resources...We will cultivate an innovation culture by empowering our staff, who have 
expertise in a wide range of subject areas, including the work of Congress, United States 
copyright law, American and foreign law, and our collections...Our plans for the future 
must entail preserving and protecting our collections and content...While we plan for our 
future, we are also paying close attention to innovations and trends that will present 
future challenges and opportunities. Newer tools, such as augmented and virtual reality, 


computer vision, natural language processing, and machine learning, are already 


transforming how we live and work. 
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Screenshot of the opening section of LOC’s digital strategy 


There aren’t many other articles on this subject,' from a quick online search, but all the 
ones I found are relatively positive, although some are more critical than others. Roll Call, in 
their article on the subject, described how the digital strategy is “digital forward,” advocated 
strongly by Librarian of Congress Carla Hayden (who heads LOC, and formerly the Pratt Library 
in Baltimore), and Kate Zwaard, the Director of Digital Strategy. Most interesting in this article 


was not that Accenture, a huge contractor, won a contract “to build the long-planned new data 


1 Through a further search I found a snippet from the report on infodocket, dh+lib blog of the ALA, and the Digital 
Journal. 


center” for LOC, or that the plan includes “employing user-centered design to invite digital and 
physical visitors to explore more offerings” but that the organization has been stuck in the past, 
trying to shed this past, because it has “a computing system built in the 1970s to static processes 
for staff.” Having a 21st century computing system is important for LOC, which holds over 167 
million items in its collections which sit on “approximately 838 miles of bookshelves,” making it 
the “largest library in the world.” 

FedScoop also wrote about the digital strategy, noting that the “The Library of 
Congress...is interested in exploring what artificial intelligence and similar technologies can do 
for its mission,” saying this focus on digital aspects is not “out of the blue” as LOC launched 
labs.loc.gov, “a home for digital experiments...last year...[and] it...recently began 
experimenting with geographic information systems mapping as a way to explore collections 
online.” Both are positive aspects, to say the least. 

Finally, there is Cory Doctrow of Boing Boing, which often has short articles with little 
content other than the document(s) they are quoting from. Regardless, Doctrow describes how 
the digital strategy supports “data-driven research with giant bulk-downloadable corpuses of 
materials and metadata...crowdsourc[ing] the acquisition of new materials...[and] preserv[ing] 
digital assets with the same assiduousness that the Library has shown with its physical collection 


for centuries,” among other aspects. He interestingly notes how the LOC has an “outsized role” 


in the current digital era because it contains the Copyright Office, which is “patient zero in the 
epidemic of terrible internet law that reaches into every corner of our lives.” This clashes with 
the fact that Carl Hayden, the Librarian of Congress “is the most freedom-friendly, internet- 
friendly, access-friendly leader in the Library’s history, replacing unfit leaders who were brought 
down in grotesque corruption scandals” even though her leadership has fallen short, in 
Doctrow’s view, because “the Copyright Office is still a creature of Big Content, and it has direct 
oversight over your ability to modify, repair, sell, and use all of your digital property.” Still, he 
argues that 

...this digital strategy is a very bright light, but it shines in a dark and menacing cave. I 

love the Library — I love its work, its collections, its diligent and thoughtful staff, its 

magnificent building. But for all that, the Library has become a locus of terrible policy 


that runs directly counter to its mission. The contradiction between the Library’s mission 


and its real role in policy has never been more clear than it is in this wonderful 

document.” 

That brings me to the end of this article. What are your thoughts on this new digital 
strategy of LOC and its new Crowd program? 
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? James Tanner of Genealogy’s Star makes a similar point, but says that LOC is not “certainly not the leader in the 
number and value of their online offerings” since the “the recent history of the Library of Congress is far from 
promising” with the closure of the Local History and Genealogy Reading Room in 2013, and the “inherent 
contradiction in the current efforts of the Library of Congress due to the fact that they are also the agency 
responsible for the controversial access policies inherent in the United States Copyright Law because the Copyright 
Office is an integral part of the Library.” This means, as Tanner argues, due to “Congressional action, use and access 
to many valuable research materials have been overwhelmingly restricted” while adding that “policies and 
budgetary constraints at both the Library of Congress and the National Archives have severely limited the number 
and availability of digitized records from both institutions. It would be a huge change if this present plan includes 
real changes in the number and availability to access items in both institutions collections.” Still, he is optimistic, 
saying that “it will be interesting to see what will happen, although I do not expect any significant changes during 
what is left of my lifetime,” although he says that the Internet Archive “may become the largest library in the world 
considering its growth during the past few months and years assuming they catch up with the National Library of 
Australia.” 


